Amendments to the Claims 



Claim 1 (currently amended). A system for disambiguating speech input using one of 
voice mode interaction, visual mode interaction, or a combination of voice mode and 
visual mode interaction multimodal int e raction with an application comprising: 

a speech recognition component that receives recorded audio or speech input and 
generates: 

one or more tokens corresponding to the speech input; and 

for each of the one or more tokens, a confidence value indicative of the likelihood 
that [[the]] a given token correctly represents the speech input; 

a selection component that identifies, according to a selection algorithm, which two 
or more tokens are to be presented to a user as alternatives^ wherein said alternatives 
are words or tokens; 

one or more disambiguation components that p e rform said multimodal int e raction to 
present the alternatives to the user in one of voice mode, visual mode, or a 
combination of voice mode and visual mode, and to r e c e iv e a selection of alt e rnativ e s 
from receive an alternative selected by the user in one of voice mode, visual mode, or 
a combination of voice mode and visual mode [[ ,]] wh e r e in th e multimodal 
int e raction allows input and output in voic e and visual mod e s ; and 

an output interface that presents the selected alternative to an application as input. 

Claim 2 (cancelled). The system of claim 1, wherein the disambiguation components and 
the application reside on a single computing device. 
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Claim 3 (cancelled). The system of claim 1, wherein the disambiguation components and 
the application reside on separate computing devices. 

Claim 4 (original). The system of claim 1, wherein the one or more disambiguation 
components perform said interaction by presenting the user with alternatives in a visual 
mode, and by receiving the user's selection in a visual mode. 

Claim 5 (original). The system of claim 4, wherein the disambiguation components 
present the alternatives to the user in a visual form and allow the user to select from 
among the alternatives using a voice input. 

Claim 6 (cancelled). The system of claim 1, wherein the one or more disambiguation 
components perform said interaction by presenting the user with alternatives in a visual 
mode, and by receiving the user's selection either in a visual mode, a voice mode, or a 
combination of visual mode and voice mode. 

Claim 7 (original). The system of claim 1, wherein the selection component filters the 
one or more tokens according to a set of parameters. 

Claim 8 (original). The system of claim 7, wherein the set of parameters is user specified. 

Claim 9 (cancelled). The system of claim 1, wherein the one or more disambiguation 
components disambiguates the alternatives in plural iterative stages, whereby the first 
stage narrows the alternatives to a number of alternatives that is smaller than that initially 
generated by the selection component, but greater than one, and whereby the one or more 
disambiguation components operative iteratively to narrow the alternatives in subsequent 
iterative stages. 

Claim 10 (cancelled). The system of claim 9, whereby the number of iterative stages is 
limited to a specified number. 
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Claim 1 1 (currently amended). A method of processing speech input using one of voice 
mode interaction, visual mode interaction, or a combination of voice mode and visual 
mode interaction multimodal int e raction with an application comprising: 

receiving a speech input from a user; 

determining whether the speech input is ambiguous; 

if the speech input is not ambiguous, then communicating a token representative 
of the speech input to an application as input to the application; and 

if the speech input is ambiguous; 

p e rforming a multimodal said int e raction with th e us e r wh e r e by th e us e r is 
pr e s e nt e d with plural alt e rnativ e s and s e l e cts an alt e rnativ e from among 
th e plural alternatives, wh e r e in said alt e rnatives ar e words or tok e ns, 
wh e r e in said multimodal int e raction allows input and output in voic e and 
visual modes; 

selecting two or more tokens to be presented to the user as alternatives, 
wherein said alternatives are words or tokens: 

presenting the alternatives to the user in one of voice mode, visual mode, 
or a combination of voice and visual mode, and receiving a selection of an 
alternative from the user in one of voice mode, visual mode, or a 
combination of voice mode and visual mode; and, 

communicating the selected alternative to the application as input to the 
application. 
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Claim 13 (original). The method of claim 12, wherein the interaction comprises the user 
selecting from among the plural alternatives using a combination of speech and visual- 
based input. 

Claim 14 (original). The method of claim 11, wherein the interaction comprises the user 
selecting from among the plural alternatives using visual input. 
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